[MachineLearning]tesseract使用

Machine Learning

发布日期: 2018-05-04

文章字数: 418

阅读时长: 2 分

阅读次数:

tesseract 项目

google的一个开源OCR项目，详情读项目README吧。

https://github.com/tesseract-ocr/tesseract

安装方法

https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation

首先安装相关库

apt-get install autoconf-archive automake g++ libtool libleptonica-dev make pkg-config

然后运行

cd tesseract-ocr
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

在configure过程会报错：

configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.

查看本地安装的Leptonica发现是1.73版本。查资料发现如下解释，1.74需要下载源码编译。

Tesseract versions and the minimum version of Leptonica required:
Tesseract Leptonica Ubuntu
4.00 1.74.2 Must build from source
3.05 1.74.0 Must build from source
3.04 1.71 Ubuntu 16.04 <http://packages.ubuntu.com/xenial/libtesseract3>
3.03 1.70 Ubuntu 14.04 <http://packages.ubuntu.com/trusty/libtesseract3>
3.02 1.69 Ubuntu 12.04 <http://packages.ubuntu.com/precise/libtesseract3>
3.01 1.67

安装leptonica 1.74

wget http://www.leptonica.com/source/leptonica-1.74.4.tar.gz

tar xvf leptonica-1.74.tar.gz
cd leptonica-1.74

./configure
make
sudo make install

成功后继续执行tesseract的安装。

运行tesseract

tesseract digits1.png result -l chi_sim

命令参数:

digits1.png 要识别的图片文件
result 保存识别结果的文件
-l chi_sim 选择识别的文字类别，chi是中文

报错：

Error opening data file /usr/local/share//tessdata/chi_sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
Could not initialize tesseract.

需要设置data路径

export TESSDATA_PREFIX=/usr/local/share/tessdata/

然后从git@github.com:tesseract-ocr/tessdata.git 下载需要语言的data，中文就下载chi开头的文件。把data拷贝到TESSDATA_PREFIX路径下，再执行检测命令即可。

运行结果

0
电 话 18663778972

全 国 朝 号 2012127

&) H: 02 04 12 13 16 26

标 | 标标 _

Wossoneri

http://wossoneri.github.io/2018/05/04/[MachineLearning]tesseract-tutorial/

本博客所有文章除特別声明外，均采用 CC BY-NC 4.0 许可协议。转载请注明来源 Wossoneri !

OCR tesseract

[Java] CountDownLatch 与 CyclicBarrier

看代码看到CountDownLatch，就顺便了解了一下，然后引出CyclicBarrier，于是把相关知识整理下来。

2018-05-08 Syntax

java Multithread

[Android][Framework] Gallery幻灯片流程以及一个Bitmap的bug

Gallery的幻灯片出现一个和Bitmap相关的很诡异的BUG

2018-04-25 Android

Android Gallery Framework

[MachineLearning]tesseract使用

tesseract 项目

安装方法

安装leptonica 1.74

运行tesseract

运行结果

你的赏识是我前进的动力